Test Equating by Common Items and Common Subjects: Concepts and Applications
نویسنده
چکیده
Since the invention of z-scores (standardized scores), comparison among different tests has been widely conducted by test developers, instructors, educational researchers, and psychometricians. Equating, calibration, and moderation are terms used to describe broad levels of possible comparison among educational assessments (Dorans, 2004; Feuer, Holland, Green, Bertenthal, & Hemphill, 1999; Linn, 1993; Mislevy, 1992). Equating is at one end of the linking continuum, involving the most stringent requirements of equivalence among the assessments and examinee populations to be linked, and compares tests that measure the same construct and have been designed to be equivalent. Less equivalent conditions involve calibration, which compares tests that measure the same construct but vary in design or difficulty, and moderation, which compares tests that measure different constructs. Psychometric approaches to linking assessments include linear equating, equipercentile equating, and item response theory (IRT). This article is a practical guide to conducting IRT test equating in two different scenarios:
منابع مشابه
Test Equating by Common Items and Common Subjects: Concepts and Applications. - Practical Assessment, Research & Evaluation
Since the invention of z-scores (standardized scores), comparison among different tests has been widely conducted by test developers, instructors, educational researchers, and psychometricians. Equating, calibration, and moderation are terms used to describe broad levels of possible comparison among educational assessments (Dorans, 2004; Feuer, Holland, Green, Bertenthal, & Hemphill, 1999; Linn...
متن کاملInvestigating Content and Construct Representation of a Common-item Design When Creating a Vertically Scaled Test
According to the equating guidelines, a set of common items should be a mini version of the total test in terms of content and statistical representation (Kolen & Brennan, 2004). Differences between vertical scaling and equating would suggest that these guidelines may not apply to vertical scaling in the same way that they apply to equating. This study investigated how well the guideline of con...
متن کاملComparison of proficiency in an anesthesiology course across distinct medical student cohorts: psychometric approaches to test equating.
BACKGROUND Examinations are necessary for assessment of student proficiency in medical education, but comparison of achievement across different cohorts in different tests is challenging. We applied psychometric test equating methods to compare student proficiency in two different examinations for a clinical anesthesiology course. METHODS Each examination contained 50 multiple choice items an...
متن کاملExamining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating
As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS's constituents and the field. To obtain a PDF or a print copy of a report, please visit: Abstract In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often ...
متن کاملA Review of the Effects on IRT Item Parameter Estimates with a Focus on Misbehaving Common Items in Test Equating
Many studies have investigated the topic of change or drift in item parameter estimates in the context of item response theory (IRT). Content effects, such as instructional variation and curricular emphasis, as well as context effects, such as the wording, position, or exposure of an item have been found to impact item parameter estimates. The issue becomes more critical when items with estimat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005